Skip to content

studio: extend offline DNS auto-detect to inference parent + training#5512

Merged
danielhanchen merged 6 commits into
mainfrom
offline-extend-to-parent-and-training
May 18, 2026
Merged

studio: extend offline DNS auto-detect to inference parent + training#5512
danielhanchen merged 6 commits into
mainfrom
offline-extend-to-parent-and-training

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Follow-up to #5505. That PR fixed the GGUF/llama-server load path. Two adjacent Studio code paths still burn 30-60s of soft-failed network timeouts before the worker subprocess starts when DNS to huggingface.co is dead and the model is already in the local HF cache. This PR extends the same DNS auto-detect helper to both.

Inference parent process (FastAPI side, before worker spawn)

routes/inference.py:load_model now runs ModelConfig.from_identifier inside _hf_offline_if_dns_dead so the soft-failed network calls reached transitively from there short-circuit on dead DNS:

  • utils/models/model_config.py LoRA-detect hf_model_info(identifier, token=...) call (was ~25s timeout)
  • utils/models/model_config.py hf_hub_download(identifier, 'adapter_config.json', ...) for remote LoRAs (was ~25s timeout, now bails fast via LocalEntryNotFoundError)
  • utils/transformers_version.py _check_tokenizer_config_needs_v5 raw urllib.urlopen(...)/tokenizer_config.json (was ~10s timeout)
  • utils/transformers_version.py _check_config_needs_550 raw urllib.urlopen(...)/config.json (was ~10s timeout)

The inline env-var check used by list_gguf_variants and detect_gguf_model_remote (added in #5505) is extracted into a shared _env_offline() helper to avoid duplicating the truthy-value parsing across new call sites.

Training subprocess (core/training/worker.py)

run_training_process now mirrors the DNS auto-detect already in core/inference/worker.py. On dead DNS, it sets HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, and HF_DATASETS_OFFLINE before importing torch/transformers/unsloth, so every from_pretrained, snapshot_download, and load_dataset call further down resolves from cache. Scope is per-subprocess (the orchestrator spawns a fresh worker per training run).

core/training/trainer.py:load_model skips the proactive hf_model_info gated-repo probe when _env_offline() is true. The API is unreachable, and a gated model that is already cached is exactly the scenario the user is trying to train against. from_pretrained surfaces the real error if access is actually denied.

Behaviour

  • Online: unchanged. Every network call still happens first.
  • Offline (DNS dead): the inference load and the training start fall through to cache in seconds instead of 30-60s.
  • User-set HF_HUB_OFFLINE=1 or TRANSFORMERS_OFFLINE=1: preserved end-to-end (the contextmanager already respects this from studio: load cached GGUF models when fully offline #5505).

Test plan

  • studio/backend/tests/test_offline_inference_parent.py: 7 new cases covering _env_offline() parsing, transformers_version urllib short-circuit, LoRA-detect API skip.
  • studio/backend/tests/test_offline_gguf_cache_fallback.py: 26 existing cases still pass after the env-check extraction.
  • Combined run: 33 passed in 3.74s.
  • CI green on 3.10/3.11/3.12/3.13 backend matrix.

Stacks on #5505. Recommend merging that first.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an offline fallback mechanism for Hugging Face Hub interactions by introducing DNS probing and local cache resolution for GGUF models. These changes prevent long network timeouts when huggingface.co is unreachable by automatically enabling offline modes and short-circuiting API calls. Feedback focuses on improving the implementation's thread safety by avoiding global socket timeout mutations, optimizing redundant path operations during cache scanning, and centralizing duplicated helper logic for DNS probing and environment checks.

Comment on lines +106 to +115
prev = socket.getdefaulttimeout()
socket.setdefaulttimeout(timeout)
try:
try:
socket.gethostbyname(host)
return False
except Exception:
return True
finally:
socket.setdefaulttimeout(prev)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying socket.setdefaulttimeout is not thread-safe as it affects the entire process. In the FastAPI parent process, concurrent requests could interfere with each other's timeout settings. A safer, thread-safe alternative for a reachability check is to use socket.create_connection((host, 443), timeout=timeout), which respects the timeout without mutating global state and supports both IPv4 and IPv6 addresses.

References
  1. Use socket.getaddrinfo() or high-level wrappers like socket.create_connection() to create sockets that support both IPv4 and IPv6 addresses, instead of hardcoding an address family.

Comment on lines +1832 to +1837
matches = sorted(
p.relative_to(snap).as_posix()
for p in snap.rglob("*.gguf")
if "mmproj" not in p.name.lower()
and boundary.search(p.relative_to(snap).as_posix().lower())
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

p.relative_to(snap).as_posix() is a relatively expensive path operation and is currently called twice for every file found by rglob. To improve efficiency, compute this value once and reuse the result.

                        matches = sorted(
                            rel
                            for p in snap.rglob("*.gguf")
                            if "mmproj" not in p.name.lower()
                            and (rel := p.relative_to(snap).as_posix())
                            and boundary.search(rel.lower())
                        )
References
  1. When a condition or calculated value is used across multiple conditional branches, compute it once and reuse the result to ensure consistency and improve maintainability.
  2. To improve efficiency, avoid redundant data iterations and transformations.

Comment thread studio/backend/core/inference/worker.py Outdated
Comment on lines +656 to +670
if "HF_HUB_OFFLINE" not in os.environ:
import socket as _socket

prev_timeout = _socket.getdefaulttimeout()
_socket.setdefaulttimeout(2.0)
try:
_socket.gethostbyname("huggingface.co")
except Exception:
os.environ["HF_HUB_OFFLINE"] = "1"
os.environ.setdefault("TRANSFORMERS_OFFLINE", "1")
logger.warning(
"huggingface.co unreachable; HF_HUB_OFFLINE=1 set for this worker."
)
finally:
_socket.setdefaulttimeout(prev_timeout)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DNS probe logic here is duplicated from _probe_dns_dead in core/inference/llama_cpp.py. You should centralize this logical check into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

References
  1. Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

Comment on lines +48 to +54
def _env_offline() -> bool:
"""True if HF_HUB_OFFLINE or TRANSFORMERS_OFFLINE is set to a truthy value."""
return os.environ.get("HF_HUB_OFFLINE", "").lower() in (
"1",
"true",
"yes",
) or os.environ.get("TRANSFORMERS_OFFLINE", "").lower() in ("1", "true", "yes")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _env_offline helper is duplicated in both utils/models/model_config.py and utils/transformers_version.py. Since model_config.py already imports from transformers_version.py, you should centralize this logic in transformers_version.py and import it here to improve maintainability and avoid logic drift.

References
  1. Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26053dd50b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# so offline loads don't burn ~25s waiting for the HF API to time out.
# If the repo really is a LoRA, the worker still resolves it from
# cache later via the same env var.
if not is_lora and not is_local and not _env_offline():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect cached LoRA adapters while offline

When HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE is set for a cached remote LoRA repo, this skips the only path that flips is_lora to true; from_identifier() then falls through as if the adapter repo were a full base model, so the later LoRA block never downloads/reads cached adapter_config.json and the inference worker receives is_lora=False. This regresses offline loading of already-cached HF LoRA adapters; the offline path should still check the local HF cache for adapter_config.json (or otherwise honor cached adapter metadata) instead of bypassing detection entirely.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dcb30f1c94

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +107 to +110
socket.setdefaulttimeout(timeout)
try:
try:
socket.gethostbyname(host)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound DNS probes with a real timeout

When a user's resolver is blackholed or very slow, this new probe can block the FastAPI load path well beyond the intended 2 seconds because socket.setdefaulttimeout() does not bound socket.gethostbyname() name-resolution time. In that environment, every model load that enters _hf_offline_if_dns_dead() can hang before it reaches the cached/offline fallback, so use a resolver strategy with an actual deadline instead of relying on the socket default timeout.

Useful? React with 👍 / 👎.

Comment on lines +124 to +125
if "HF_HUB_OFFLINE" in os.environ:
yield False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep offline scopes isolated per request

When two load requests overlap while DNS is down, the second one can enter here while the first request's context has temporarily set HF_HUB_OFFLINE; it then yields without owning/restoring the setting, and the first request's finally later pops the variable while the second request is still running. Because environment variables are process-wide, that second request can unexpectedly fall back to normal HF network calls/timeouts mid-scope; guard this with a lock/refcount or avoid using process-wide env mutation for per-request state.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84e3782ace

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +49 to +53
return os.environ.get("HF_HUB_OFFLINE", "").lower() in (
"1",
"true",
"yes",
) or os.environ.get("TRANSFORMERS_OFFLINE", "").lower() in ("1", "true", "yes")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor ON in offline environment flags

When users set HF_HUB_OFFLINE=ON or TRANSFORMERS_OFFLINE=ON (standard truthy values for Hugging Face/Transformers), this helper returns False, so the new offline short-circuits below are skipped and _check_tokenizer_config_needs_v5 / _check_config_needs_550 still perform raw urllib fetches that can wait up to 10 seconds each before failing. This undermines the offline path added here for a supported env spelling; include on in the truthy set, and keep the duplicate offline helper consistent.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec15082487

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +106 to +108
prev = socket.getdefaulttimeout()
socket.setdefaulttimeout(timeout)
try:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid mutating process-wide socket defaults in DNS probe

_probe_dns_dead() changes the global socket default timeout via socket.setdefaulttimeout(timeout) before resolving huggingface.co. Because this helper is used on the FastAPI parent load path, concurrent requests in the same process can create sockets during this window and unexpectedly inherit the 2s default, causing unrelated outbound calls to fail or time out early. Please switch to a probe mechanism that enforces its own deadline without touching process-wide socket defaults.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen force-pushed the offline-extend-to-parent-and-training branch from ec15082 to cf0fc65 Compare May 18, 2026 00:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 003761caf2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +116 to +120
t = threading.Thread(target = _probe, daemon = True)
t.start()
t.join(timeout)
# Thread still running -> resolver wedged -> treat as dead.
return True if result[0] is None else result[0]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse a bounded DNS probe thread

When DNS resolution hangs (the _probe thread never returns), join(timeout) exits but the daemon thread keeps running, and each call to this helper spawns another stuck thread. Because _hf_offline_if_dns_dead() is now used on model-load paths in the long-lived inference parent process, repeated loads during a resolver outage can accumulate blocked threads and eventually hit thread/memory limits, degrading or breaking further requests.

Useful? React with 👍 / 👎.

Comment on lines +129 to +131
if "HF_HUB_OFFLINE" in os.environ:
yield False
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat HF_HUB_OFFLINE as enabled only when truthy

This guard checks only for variable presence, so HF_HUB_OFFLINE=0 (a common explicit default) disables the DNS auto-detect path entirely. In that configuration, dead-DNS loads bypass the new offline short-circuit and fall back to slow network timeouts again, which defeats the regression fix this change is introducing.

Useful? React with 👍 / 👎.

rhsCZ pushed a commit to rhsCZ/unsloth that referenced this pull request May 18, 2026
Two follow-ups from the review pass on unslothai#5512:

* ModelConfig.from_identifier no longer skips the remote LoRA-detect
  hf_model_info call when _env_offline() is true. huggingface_hub
  short-circuits the call via OfflineModeIsEnabled in ~0ms when
  HF_HUB_OFFLINE is set, so the original 25s concern was moot once
  routes/inference.py wrapped the call in _hf_offline_if_dns_dead.
  Skipping the API meant users with a cached LoRA adapter
  (adapter_config.json on disk) got is_lora=False and the load
  failed. After the API call (which raises fast offline) a new
  cache-fallback walks the HF cache snapshot for adapter_config.json
  via the existing _iter_hf_cache_snapshots helper.

* test_hf_model_info_not_called_when_offline replaced. The old test
  raised AssertionError inside production code that catches Exception,
  so it passed even if the call happened. New tests use MagicMock and
  assert call_count >= 1, plus a fixture that stages a fake HF cache
  with adapter_config.json to verify the offline cache detection.

Test count goes from 7 to 8 in test_offline_inference_parent.py.
Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.
danielhanchen added a commit that referenced this pull request May 18, 2026
Two follow-ups from the review pass on #5512:

* ModelConfig.from_identifier no longer skips the remote LoRA-detect
  hf_model_info call when _env_offline() is true. huggingface_hub
  short-circuits the call via OfflineModeIsEnabled in ~0ms when
  HF_HUB_OFFLINE is set, so the original 25s concern was moot once
  routes/inference.py wrapped the call in _hf_offline_if_dns_dead.
  Skipping the API meant users with a cached LoRA adapter
  (adapter_config.json on disk) got is_lora=False and the load
  failed. After the API call (which raises fast offline) a new
  cache-fallback walks the HF cache snapshot for adapter_config.json
  via the existing _iter_hf_cache_snapshots helper.

* test_hf_model_info_not_called_when_offline replaced. The old test
  raised AssertionError inside production code that catches Exception,
  so it passed even if the call happened. New tests use MagicMock and
  assert call_count >= 1, plus a fixture that stages a fake HF cache
  with adapter_config.json to verify the offline cache detection.

Test count goes from 7 to 8 in test_offline_inference_parent.py.
Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.
@danielhanchen danielhanchen force-pushed the offline-extend-to-parent-and-training branch from 003761c to 47b9e8c Compare May 18, 2026 04:33

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7f94471ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

# from_pretrained / snapshot_download / load_dataset calls resolve
# from cache instead of burning ~25s per call on retries. Scoped to
# this subprocess only (orchestrator spawns a fresh worker per run).
if "HF_HUB_OFFLINE" not in os.environ:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run DNS probe when HF_HUB_OFFLINE is falsy

This guard treats any presence of HF_HUB_OFFLINE as an opt-out, so common explicit defaults like HF_HUB_OFFLINE=0 skip the auto-detect path entirely. In that configuration, a dead resolver no longer flips the worker to offline mode and training falls back to slow Hugging Face/network timeouts, which defeats the latency regression fix this block is meant to provide for cached models.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6030709a47

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +255 to +257
if _env_offline():
_tokenizer_class_cache[model_name] = False
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read cached configs before defaulting offline

When HF_HUB_OFFLINE is set for a cached remote model whose required transformers tier is only discoverable from tokenizer_config.json/config.json (for example a forked Gemma4-style repo without the hard-coded name substring), this new early return skips the cached HF snapshot and forces _check_tokenizer_config_needs_v5 to False; the duplicated guard in _check_config_needs_550 does the same. activate_transformers_for_subprocess() then selects the default 4.57.x tier even though the needed metadata is already on disk, so offline training/inference of those cached models can fail with the wrong transformers version instead of avoiding only the network fetch.

Useful? React with 👍 / 👎.

danielhanchen and others added 6 commits May 18, 2026 05:40
#5505 fixed the GGUF/llama-server load path. Studio still has two
adjacent code paths that burn ~30-60s of soft-failed timeouts before
the worker subprocess starts when DNS to huggingface.co is dead and
the model is already in the local HF cache.

Inference parent process (routes/inference.py:load_model):

* ModelConfig.from_identifier now runs inside _hf_offline_if_dns_dead
  so the LoRA-detect hf_model_info call and the urllib config probes
  in utils/transformers_version.py short-circuit when DNS is dead.
* utils/models/model_config.py: extracted the inline HF_HUB_OFFLINE/
  TRANSFORMERS_OFFLINE check used by list_gguf_variants and
  detect_gguf_model_remote into a shared _env_offline() helper, then
  reused it to gate the LoRA-detect hf_model_info call.
* utils/transformers_version.py: _check_tokenizer_config_needs_v5 and
  _check_config_needs_550 now early-return False when offline instead
  of issuing a 10s urllib.urlopen against huggingface.co/raw/main.

Training worker (core/training/worker.py:run_training_process):

* Add the same 2s DNS probe used by core/inference/worker.py at the
  top of the training subprocess. On failure, set HF_HUB_OFFLINE,
  TRANSFORMERS_OFFLINE, and HF_DATASETS_OFFLINE before the rest of
  the subprocess imports torch/transformers/unsloth, so every
  from_pretrained, snapshot_download, and load_dataset call below
  resolves from cache. Scope is per-subprocess; the orchestrator
  always spawns a fresh worker per training run.

Training trainer (core/training/trainer.py:load_model):

* Skip the proactive hf_model_info gated-repo probe when _env_offline()
  is true. The API is unreachable anyway, and a gated model that is
  already cached is exactly the scenario the user is trying to train
  against. from_pretrained surfaces the real error if access is
  actually denied.

Tests (tests/test_offline_inference_parent.py, 7 new cases):

* _env_offline truthy/falsy parsing across HF_HUB_OFFLINE and
  TRANSFORMERS_OFFLINE.
* transformers_version urllib short-circuit when offline.
* LoRA detect hf_model_info skip when offline.

Existing tests/test_offline_gguf_cache_fallback.py still passes
(26 cases) because the inline env check was extracted, not changed.
The studio test stub convention only included the 6 httpx exception
names that existed callers needed. Newer huggingface_hub (1.15+)
imports HTTPError, Response, Request, HTTPStatusError, AsyncClient,
and more at module import time. When httpx is truly absent the stub
chase becomes a treadmill.

Use the real package when installed (the CI install list already
includes httpx, so this is the production environment). Fall back to
the stub only when httpx is genuinely missing.

No code under test changes.
Two follow-ups from the review pass on #5512:

* ModelConfig.from_identifier no longer skips the remote LoRA-detect
  hf_model_info call when _env_offline() is true. huggingface_hub
  short-circuits the call via OfflineModeIsEnabled in ~0ms when
  HF_HUB_OFFLINE is set, so the original 25s concern was moot once
  routes/inference.py wrapped the call in _hf_offline_if_dns_dead.
  Skipping the API meant users with a cached LoRA adapter
  (adapter_config.json on disk) got is_lora=False and the load
  failed. After the API call (which raises fast offline) a new
  cache-fallback walks the HF cache snapshot for adapter_config.json
  via the existing _iter_hf_cache_snapshots helper.

* test_hf_model_info_not_called_when_offline replaced. The old test
  raised AssertionError inside production code that catches Exception,
  so it passed even if the call happened. New tests use MagicMock and
  assert call_count >= 1, plus a fixture that stages a fake HF cache
  with adapter_config.json to verify the offline cache detection.

Test count goes from 7 to 8 in test_offline_inference_parent.py.
Combined with test_offline_gguf_cache_fallback.py: 34 pass in 9.75s.
Same fix as #5505's _probe_dns_dead refactor: run gethostbyname on a
daemon thread with join timeout so concurrent sockets in the parent
interpreter never inherit a process-wide socket.setdefaulttimeout
mutation. Adds a static-pin regression test that the inference parent
file does not regress on this.
Shorten the longer explanatory comments added by this PR while keeping
the WHY of each non-obvious branch:

- trainer.py: collapse the 5-line proactive gated-check comment.
- training/worker.py: trim the offline auto-detect preamble and the
  "logger isn't configured" note.
- routes/inference.py: shorten the DNS-probe wrap rationale.
- transformers_version.py: collapse the two urllib short-circuit notes.
- model_config.py: shorten the LoRA detect + cache-fallback notes.
- tests/test_offline_inference_parent.py: tighter module docstring,
  trim class docstrings, drop multi-line explainer comments inside the
  tests; behaviour and coverage unchanged (9/9 tests still pass).
@danielhanchen danielhanchen force-pushed the offline-extend-to-parent-and-training branch from 6030709 to 72ff31a Compare May 18, 2026 05:41

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72ff31a15d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


# Offline: skip the 10s urllib fetch (fail-open to lower tier).
if _env_offline():
_config_needs_550_cache[model_name] = False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't cache transient offline tier defaults

When _hf_offline_if_dns_dead() temporarily enables offline mode for a parent-process load and this check runs for a remote model without a local config.json, this stores False in the process-wide _config_needs_550_cache. Because the cache is consulted before any later local/network read, the same model remains classified as not needing transformers 5.5.0 even after DNS recovers, so the inference orchestrator/vision detection can keep spawning workers with the wrong transformers tier. The offline shortcut should avoid populating the normal cache, or key the cached value on the offline state.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen merged commit 3876c87 into main May 18, 2026
32 of 33 checks passed
@danielhanchen danielhanchen deleted the offline-extend-to-parent-and-training branch May 18, 2026 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant